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CHAPTER I 



INTRODUCTION 
Emerson Foulke* 



Of all of the forms of communication in which humans engage, perhaps 
the most important is that which depends upon the interaction of one or 
more speakers and one or more listeners. However, we have tended 
to take it for granted, and we have not subjected it to the intensive 
scrutiny given to other forms of communication, such as written com- 
munication. 

Throughout most of man's history, proximity has been a necessary 
condition for communication between speakers and listeners. However, 
because of the radio and the telephone, both of which have been devel- 
oped largely within this century, spatial proximity is no longer a nec- 
essary condition and, because human speech can now be recorded for 
subsequent reproduction, temporal proximity is no longer a necessary 
condition either. 

Until recently, there has been no way to gain significant control over 
the rate of communication between speakers and listeners. This rate 
has been determined primarily by the cognitive and articulatory limi- 
tations of speakers, and has not been amenable to the preferences or 
capabilities of listeners. However, with the advent of methods such 
as the one described by Grant Fairbanks and his co-workers at the 
University of Illinois (Fairbanks, Everitt, &: Jaeger, 1954), it has be- 
come possible to vary the rate of recorded speech without materially 
affecting its other parameters. 

In the past few decades, there has been a growing awareness of the edu- 
cational importance of the communication that takes place between 
speakers and listeners, and of the possibilities afforded by modern 
communication technology for increasing its flexibility and efficiency 
as an educational tool. One manifestation of this new awareness is the 
growing number of courses offered in educational and industrial set- 
tings for the purpose of improving listening skills. A special interest 



❖Dr. Emerson Foulke, Director of the Perceptual Alternatives Labora- 
tory, University of Louisville, is Editor of the Proceedings of the Sec- 
ond Louisville Conference on Rate and/or Frequency- Controlled Speech. 
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has been expressed by those concerned with the education of children 
who, for whatever reason, must place extraordinary reliance on speak- 
ing and listening in order to communicate. Blind school children, for 
instance, depend heavily upon reading by listening because they cannot 
read print and because they read braille su slowly. There is an increas- 
ing awareness on the part of educators of the large number of children 
without visual impairment who have serious reading problems that do 
not yield to remedial efforts, and the advantage they might gain from 
reading by listening is beginning to receive attention. Much of the in- 
struction provided by college and industry depends upon aural commu- 
nication, and the feasibility of increasing the rate of recorded speech 
suggests intriguing possibilities for more efficient utilization of the 
limited time available for instruction. 

The ability to reduce the rate of recorded speech may also be valuable. 
Word rates that are slower than normal may, in some cases, be more 
compatible with the cognitive abilities of mentally retarded children. 
Students of foreign language and individuals with problems of articula- 
tion may profit by the opportunity to hear the phonetic components of 
spoken words at a slower rate. Recorded speech presented at a rate 
that is slower than normal may afford a technique for pacing slow readers 
or students of typing. Secretaries may be able to transcribe recorded 
dictation more efficiently when they listen to speech, the word rate of 
which has been reduced. 

The first method for altering the word rate of recorded speech to receive 
the attention of investigators (Fletcher, 1929; Klumpp & Webster, 1961) 
was the reproduction of a tape or record at a different speed than the one 
used during recording. This method achieves the desired effect as far 
as word rate is concerned. Reproduction at a faster speed increases 
word rate, while reproduction at a slower speed reduces word rate. 
However, in either case, serious distortion is introduced that soon ren- 
ders words unintelligible. Fortunately, the method developed by Fair- 
banks et al. is now available. This is a sampling method in which 
periodic samples of a recorded signal are reproduced in order and with 
temporal contiguity. If the duration of the samples discarded by this 
procedure is brief enough, the ear will not be able to detect their absence, 
and if the time required for the reproduction of each critical feature of a 
speech signal is greater than the time represented by each discarded 
sample, it will be impossible for the critical feature to fall entirely with- 
in a discarded sample. These conditions are satisfactorily m.et when 
discarded samples are 30 milliseconds (msec. ) in duration or less, and 
the result is speech, the word rate of which has been increased without 
distortion in pitch or voice quality. A recorded signal may be expanded 
in time by reproducing overlapping samples of that signal and the result 
is a reduction in word rate without pitch distortion. 
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The control of speech rate that was made possible by the commercial 
availability of equipment for sampling speech in the manner just de- 
scribed has stimulated a great deal of research concerning the effect 
of speech rate on the intelligibility of words and phrases, and the 
comprehensibility of fluent speech (Fairbanks, Guttman, & Miron, 

1957c; Fairbanks & Kodman, 1957; Foulke, Amster, Nolan, & Bixler, 
1962; Foulke & Sticht, 1967a; Friedman, Orr, Freedle, & Norris, 

1966; Garvey, 1953b; Orr & Friedman, 1964). Experimental attention 
has been given to a variety of questions in which word rate figures as a 
factor. There has been an accumulation of experimental results which 
support the general conclusion that speech may be presented at a rate 
in the neighborhood of 275 words per minute (wpm) with the expectation 
of satisfactory comprehension, and if an appropriate training experi- 
ence can be devised, comprehension of speech at much higher word rates 
may be possible as well. Because of these findings, many people have 
begun to give serious consideration to the benefits that might be realized 
by the use of rate- controlled recorded speech and there has been a stead- 
ily increasing interaction between researchers and educators in develop- 
ing its practical applications. In addition, those interested in basic 
research on the perception of speech have taken advantage of the oppor- 
tunity to control speech rate while holding other parameters constant 
(Foulke Sticht, 1967a; Friedman & Johnson, 1968; Miron & Brown, 

1968; Overmann, 1969; Wilson, 1969). 

The first Louisville Conference on Time- Compres sed Speech was con- 
vened at the University of Louisville on October 19, 20, and 21, 1966. 

The Conference was presented under the joint sponsorship of the Library 
of Congress and the University of Louisville, with additional financial 
support from the Office of Education. A volume containing the proceed- 
ings of the Conference, and an extensive list of references to the research 
literature on rate- controlled recorded speech, was prepared and distrib- 
uted. This volume has proved to be a valuable source of information for 
those interested in rate- controlled recorded speech. 

Another outcome of the Conference was the appointment of an implemen- 
tation committee, charged with the responsibility of promoting action on 
recommendations developed during the Conference. One of the most 
urgent recommendations of the Conference was the establishment of a 
national center from which rate-controlled speech could be obtained. In 
response to this recommendation, the Center for Rate- Controlled Re- 
co dings was established at the University of Louisville, under the di- 
rection of Dr. Emerson Foulke, with the implementation committee 
serving as its Advisory Board. Since that time, the Board has met two 
or three times each year to discuss the development of rate-controlled 
recorded speech as a tool for research and education, to review the 
activities of the Center, and to participate in the formulation of new Cen- 
ter projects . 
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Another urgent recommendation of the first Louisville conference was 
for the development of a mechanism for disseminating information 
about rate- controlled recorded speech to those interested in its appli- 
cations. In response to this suggestion, the Center undertook the pub- 
lication of a monthly newsletter which reports research plans and findings, 
new applications, equipment development, and other information of in- 
terest to workers in the field. In addition, the Center fills requests for 
research reports and demonstration tapes containing samples of recorded 
speech, compressed or expanded in time by the several known methods. 

Since the first Louisville conference, there has been a rapid growth in 
the level of interest and activity concerning rate - controlled recorded 
speech. Accordingly, the Center's Board decided to convene a second 
Louisville conference to serve this interest and the related interest of 
frequency-controlled speech. The Second Louisville Conference on Rate 
and/or Frequency- Controlled Speech was held at the University of 
Louisville on October 22, 23, and 24, 1 96 9 , under the sponsorship of 
the University of Louisville, with financial support from the American 
Foundation for the Blind, the Library of Congress, and the Office of Edu- 
cation. This Conference was attended by approximately 125 people, 
representing such fields as psychology, linguistics, education, educa- 
tional administration, library science, engineering, and industry. The 
Conference program consisted of reports in three categories: basic re- 
search concerning the perception of time and/or frequency-controlled 
speech; technical reports concerning the production of time and/or 
frequency- controlled speech; reports of practical applications of such 
speech in educational, industrial, and other settings. A prec'onference 
workshop was held for the purpose of providing some exposure to relevant 
terms and concepts for those unfamiliar with the area. The first confer- 
ence day included a luncheon meeting with Dr. A. Hood Roberts as the 
guest speaker. * 

This volume contains the 33 conference reports. Since there was con- 
siderable overlap in the references cited by authors, it was decided not 
to include a list of references at the end of each report. Instead, the 
references cited by authors have been combined into a single list. This 
list has been augmented by entries from the reference file maintained by 
the Center for Rate- Controlled Recordings, and from a list of references 
prepared by Dr. Daniel S. Beasley, Department of Audiology and Speech 
Sciences, Michigan State University, and Dr. Willard R. Zemlin, Voice 
and Hearing Sciences Research Laboratory, University of Illinois. This 
list of references, though possibly not bibliographic in scope, is extensive, 




'•'Dr. A. Hood Roberts is affiliated with the Center for Applied Linguistics, 
Washington, D. C. The title of his luncheon address was "Automation 
and Speech. " 
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and it is hoped that it will serve as a valuable resource to those wishing 
to read in the area. 

In some cases, Conference reports were written by more than one au- 
thor. Unless otherwise indicated, these reports were presented by the 
senior authors. Dr. Daniel Ling and Dr. Paul Resta were scheduled 
to make reports to the Conference. Due to circumstances beyond their 
control, they were unable to attend the Conference. Nevertheless, 
their reports have been included in this volume. 

Mr. Stephen F. Temmer, President of Infotronic Systems, Inc. , re- 
ported on the Information Rate Changer, Mark III, which will be avail- 
able for distribution by Infotronic Systems before long. The Mark III 
is a completely redesigned machine. Unlike previous models, it is 
not restricted to the reproduction of tape recorded at 15 ips. Further- 
more, if desired, the pitch of the recorded speech signal can be varied 
without affecting word rate. His report has not been included since it 
was an informal demonstration of the capabilities of the Information Rate 
Changer, Mark III. 

The preconference workshop was presented by Dr. Willard Zemlin, Dr. 
Emerson Foulke, and Dr. Robert Scott. Dr. Zemlin presented a discus- 
sion of the mechanisms involved in speech production and hearing, and 
of acoustical energy containing speech information. Dr. Foulke explained 
the compression or expansion of speech by the sampling method and de- 
scribed the manner in which it is accomplished by electromechanical 
compressors of the Fairbanks type. Dr. Scott described the general 
procedures involved when computers are used for the compression or 
expansion of speech by the sampling method. The remarks of those who 
conducted the Workshop have not been included in this volume, since they 
were made extemporaneously, and since the effort to record them on 
tape was not entirely successful. However, no new information was pre- 
sented at the Workshop. Its purpose was to provide a background for 
inexperienced Conference participants, and the information presented 
is generally available elsewhere. 
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CHAPTER II 



AN INTRODUCTION TO SPEECH TIME COMPRESSION TECHNIQUES: 
THE EARLY DEVELOPMENT OF SPEECH TIME COMPRESSION 

CONCEPT AND TECHNOLOGY 
H, Leslie Cramer* 



Introduction 

It should be obvious that, until it was possible to record and play back 
speech or sound in some manner, it was impossible to develop any sort 
of speech compression system. Speech time compression has only been 
possible and developed as the technology for mechanical and electronic 
acoustic recording has advanced. 

There are two parallel developments that have taken place. One is the 
conceptual development of time compression. The second is the devel- 
opment of audio recording-playback systems, which, although preceding 
the development of the concept of time compression, will be taken up in 
the latter part of this paper. 



The Conceptual Development of Time Compression Methods 

Following are findings of some of the significant experiments that led 
researchers gradually into the idea of time compressing speech. 

One of the earliest experimenters in this field was Harvey Fletcher (1929) 
of the Bell Telephone Research Laboratories. In 1929, he published his 
findings on accelerating speech phonographically; that is, the playing of 
a phonograph record at a speed faster than that at which it was recorded. 
Recorded speech played in this manner increases the frequency, re- 
sulting in speech which has a "Donald Duck" or "Chipmunk" effect. At 
moderate rates of acceleration, such speech is intelligible, especially 



*Dr. H. Leslie Cramer is Senior Research Analyst, Peace Corps, 806 
Connecticut Avenue, N. W. , Washington, D. C. 20525. 
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with practice in listening to it. There have been many studies dealing 
with the comprehension of speech so produced. However, the remainder 
of this paper will be limited to the development of time compression of 
speech without attendent frequency distortion, or rise in pitch. 

The pattern shown in Figure 2. 1 was made on a sound recording instru- 
ment called an oscillograph. This is the tracing of the. vowel sound 
/a/ , as in the word "father. " A . fundamental, cycle or pitch pe- 

riod of this voice tracing is r jprese. me „y the portion between points 
A and B, while the portions between : int; 3-C and C-D represent suc- 
cessive pitch periods. The part shov he -e is nly a. small part of a 
vowel sound, which may have from 20 j 5 complete cycles, depend ng 
on the pitph of a speaker's voice, his r »f sp aking, and the particular 
vowel spcjken. 

Gemelli (1934) in Italy and Peterson (1 C 3 C in th United States, both ex- 
perimented with the time duration of a ph -me " hich is necessary for 
it to be properly perceived. Their finain werr nearly identical; that 
is, both discovered that only one or two . caplet t pitch periods of a 
vowel sound are necessary for its percepti on anc. identification. These 
findings made it clear that, at least in vowel sounds, there is a high de- 
gree of redundancy in speech. 

Steinberg (1936) reported that speech rates could be increased by playing 
records at accelerated speed without a great loss in intelligibility, at 
least with moderate rates of increase. 

In 1940, Goldstein at Columbia University started experimenting with 
rate of speech to determine the comprehension of continuous discourse 
at gradually increasing increments of words per minute (wpm). He re- 
corded lectures at increasing wpm rates and then presented this recorded 
material to students to determine how well they could understand it. The 
maximum rate of 325 wpm was produced by partially accelerating a pho- 
nograph record which had been recorded at 285 wpm. This was done be- 
cause he was unable to find a speaker who could articulate clearly a.t 325 
wpm. The 325 wpm presentation, according to Goldstein, was not, how- 
ever, noticeably distorted. He found that his subjects had fairly good 
perception and understanding at this high rate. This led to the idea that 
our listening speed is primarily limited by the rate of speech production 
rather than by perceptual or cognitive structures. 

Miller (1946) and Miller and Licklider (1950) experimented with an elec- 
tronic switching system for interrupting speech. This process blanked 
out alternate portions of speech. With 50% of the speech cut out, intelli- 
gibility fell only 15%. 
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Figure 2. 1. Oscillograph tracing of the / a/ sound of the word 
father. (This includes only about one- fourth of the pitch periods of the 
/a/ sound. ) 
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In 1948, John Black, at Ohio State University, conducted research for 
the Office of Naval Research (ONR). He was experimenting to determine 
the significance of different phonemes for word intelligibility. He sim- 
ply used a razor blade to cut pieces out of a recorded tape, splicing the 
remaining pieces together. This was done to analyze the contribution of 
vowels and consonants to the intelligibility of single words. 

It was Black's report on this ONR research which stimulated Garvey and 
Henneman (195)) to work on the "cut- splice" method. They reasoned 
that Black's cu~ and splice method could be used to elim.ina.te part of the 
speech recorded on a tape, as Miller and Licklider had done in their 
study of electronically interrupted speech. With Black's method, hotv- 
ever, the gaps of silence in Miller and Licklider's process would be 
eliminated and a saving in time would result. Their reasoning was sound, 
and a highly intelligible speech record was produced at speed-up ratios 
from 33% to 400% (1. 25 to 4 times normal). 

This method for time compressing speech can best be conceived by visu- 
alizing cutting alternate one-quarter inch pieces out of a recorded tape. 
Every other piece may be discarded, and those remaining spliced back 
together. Such a processed tape would make it possible to hear a half- 
hour lecture in 15 minutes because it is literally only half there. However, 
because each segment is played back at the speed at which it was recorded, 
there would not be the rise in pitch, or "Donald Duck" effect. Instead, 
the voice would sound normal in terms of pitch, and only the speed, or 
wpm rate, would have increased. 

Figure 2. 2 shows the comparison of intelligibility of "chop- splice " pro- 
duced time- compre s sed speech with phonographically accelerated speech 
produced by both Garvey and Steinberg. This figure shows that the Uni- 
versity of Virginia "chop- splice " method (Garvey) produces speech which 
remains above 90% intelligible at 2. 5 times the input ratio. It may also 
be seen from this graph that the phonographic acceleration of speech by 
both Steinberg and Garvey does not produce speech as intelligible as the 
"chop- splice" method. 

After finishing his thesis at the University of Virginia, Garvey was quite 
tired of cutting and splicing pieces of tape together. Sometime ago he 
stated that he was so sick and tired of recording tape and splicing tape 
that he hoped in his entire life he'd never see another tape splicer or reel 
of tape . 

It is fortunate for researchers that within a couple of years of Garvey's 
work with the "cut- splice " method, Grant Fairbanks, W. L. Everitt, and 
R. P. Jaeger (1953, 1954, 1959) at the University of Illinois applied for 
a patent on an instrument which would automatically accomplish the same 
result in terms of eliminating pieces of speech. 




15 



Percentage Intelligibility 



’0 




0 q U. Va. Chop- splice Method 

Steinberg 

O— - — — o U. Va. Speed-up with Frequency Shift 



Figure 2. 2. Comparison of intelligibility loss for various speed- 
up rates between the "chop- splice " techniques and speed-up methods 
involving frequency shift. (From Garvey and Henneman, 1950, p. 16, 
Figure 6. ) 
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Fairbanks' method of automatically scanning a magnetically recorded 
tape, which reproduces a portion and eliminates another portion of each 
speech segment was developed by Fairbanks et al. (1953, 1954, 1959). 

Referring to Figure 2. 3, tape loop (1) traveling in the direction shown 
by arrow (7) passes over erase head (8) and recording head (9). The tape 
loop (1) then goes over idler (2), down around the rotating head assembly 
(10), between the tape drive capstan (5) and pressure roller (6), around 
tension adjusting wheel (3) and back to erase head (8) where it started. 
When the compressor is in operation, material on the tape is erased at 
erase head (8) in order to record cleanly at the record head (9). The 
recorded tape passes the rotating head assembly (10) in the direction 
shown by arrow (7). The tape moves faster than the rotating head as- 
sembly, so that speech recorded on the tape is picked up by any one of 
the four heads (A, B, C, and D) in the assembly over which it is passing. 
At the instant when head A leaves contact with the tape, head B contacts 
the tape. Everything recorded on the tape wrapped around the rotating 
head assembly between heads A and B will not be scanned or played back 
by either head A or B and therefore will be discarded. The temporal 
length of the unscanned material is referred to as the interval discarded 
(1^), while the part played back by each head constitutes the interval 
sampled (I g ). These two factors can be varied with the Fairbanks equip- 
ment so that one may specify either a specified sampling or a discard 
interval at any given compression ratio. Of the three factor s- - com- 
pression ratio, discard interval, and sampling interval- -two have to be 
fixed. * 

It may be seen in retrospect that the work of Fletcher, Steinberg, and 
Goldstein showed that one could clearly understand speech at rates faster 
than speakers are capable of articulating and producing continuous dis- 
course. Gemelli and Peterson added experimental evidence that one 
need hear only a small part of vowel sounds to properly identify them. 
Miller found that alternate portions of speech could be blanked out with- 
out a great decrease in intelligibility. Black and Garvey both eliminated 
pieces of the speech record without leaving blank spaces so that the wpm 
rate was increased without a pitch rise or great loss in intelligibility. 

The final synthesis of these findings was their embodiment in the speech 
compressor invented by Fairbanks et al. (1959). 



* A more complete explanation of Fairbanks' Compressor, complete 
with operating formula and peripheral equipment adjustments, is avail- 
able in Cramer (1968), pp. 40-51 and pp. 191-203. 
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3 - Tension Adjusting Wheel 10 - Rotating Head Assembly 

4 - Mounting Plate 11 - Playback Heads 

5 - Capstan 12 - Direction of Rotating 

6 - Pressure Roller Head Assembly when 

7 - Direction of Tape Loop Compressing 

Travel 



Figure 2. 3. Detail drawing of Fairbanks' compressor. (From 
Fairbanks, Everitt, and Jaeger, 1959. ) 
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The Development of the Technology for Time Compressing Speech 

The second major development referred to at the beginning of this paper 
relates to the technology for electromechanical recording and playback 
of auditory signals. The treatment of this area must necessarily be re- 
stricted to that bearing directly on methods of either recording or play- 
back that scan or sample an original auditory input, or otherwise 
translate frequencies up or down the scale. The writer believes that 
the coverage of this topic is complete, but will be most interested in 
references to any other devices on which patents are held that are not 
reported here. 

In the following brief review of patents, the dates given in the text will 
be the original filing dates, while those in the references represent the 
actual date a patent was issued. This seems necessary in view of the 
fact that it is the date of conception of the idea that is important, and in 
many cases there was a substantial delay in the awarding of the patent. 
However, the interested reader needs the date of issue in order to re- 
trieve information on the patents. 

The earliest record of a speech scanning system is a U. S. patent filed 
by N. R. French and M. K. Zinn (1928) in December 1924. This system 
proposes to rotate a microphone around a sound pipe or speaking tube 
bent in a circle with a slot around its edge (see Figure 2.4). 

This patent, it turns out on analysis, would not work with air as the 
sound carrying medium, since the tube would have to be 15 feet in cir- 
cumference and scanned at 32, 000 rpm in order to accomplish 50% com- 
pression. This patent therefore really only represents the concept of 
scanning without the reduction to practice normally required in a patent. 

In 1930, Berthold Freund (1935) applied for a patent on a device used 
for scanning motion picture film sound recordings which could be used to 
vary the length of sound records. This was developed for synchronizing 
sound to the film track and for shortening the speech record to match a 
speeded up portion of film, without having a pitch rise. This appears 
to be the first apparatus capable of actually time compressing speech, 
although no claim for use other than to match film records was made (see 
Figure 2. 5). 

In 1935, Homer W. Dudley (1938) applied for a patent on a signalling 
system which sampled every other pitch period of speech and transmitted 
it to a distant point. At that distant point, each pitc 1 period was re- 
peated once to reestablish a wave form similar to t T original. The 
patent made no claim, for saving the listener time, as it was only to save 
time on transmission that it was developed. This same apparatus could, 
without repeating each signal on the output end, compress speech. 
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Figure 2.4. Detail from French and Zinn (1928), showing their 
Figure 12a and Figure 12b illustrating a rotating microphone sound tube 
scanning system. 




Figure 2. 5. Detail from Freund (1935), showing his Figure 1 and 
Figure 4 illustrating system for scanning motion picture film sound track. 
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In 1936, R. L. Miller (1939) applied for a patent on a signalling system 
which used frequency division for speech bandwidth reduction. This 
patent anticipated the harmonic compressor as worked out recently by 
the American Foundation for the Blind. 

In 1936, Leonid Gabrilovitch (1939) developed a system for scanning a 
steel wire recording with rotating heads. This was similar to Dudley's 
in that it was designed to reduce frequencies for transmission. At the 
transmitting end of a line, every other segment was divided before 
transmission, then at the receiving end it was multiplied in frequency 
and repeated once before the next segment arrived (see Figure 2. 6). 

In 1938, Eduard Schuller (1942) patented a similar device in Germany 
which was used for playing back magnetic recordings in less time or 
in longer time than that in which they were actually recorded. In his 
patent he states: 

"If the sound head is rotated in the same direction as that 
of the travel of the record strip, . . . an acoustical time 
compressing is obtained and the reproduced signal has its 
original frequency but is read off in less time than was re- 
quired for the recording. " 

This is the first clear reference to time compressing speech with the 
method Fairbanks later developed, apparently independently. 

Figure 2. 7 shows Figures 1 and 2 from Schuller's patent and greatly re- 
sembles others. 

In 1944, Gabor (1949) applied for a patent on a device using microscope 
lenses in a ring to scan the sound track of a motion picture film. See 
Figure 2. 8 for Gabor's diagram of this process. 

In 1947, Gabor (1950) developed many ingenious ways of both scanning 
and blending adjacent samples of the speech record. See Figure 2. 9 
for diagrams of some of these. These figures display systems of scan- 
ning a track photographically, and electronically. Gabor's Figure 11 
shows how lenses are formed by discharging a spark synchronized to 
voice pitch periods, through water. The bubbles of gas so produced 
are then circulated by a pump past the sound record to be scanned. A 
real Rube Goldberg device ! 

In 1950, Vilbig (1950, 1952, 1967) described a string filter device which 
is a physical analog of the electronic harmonic compressor. It could be 
used only to compress by a factor of two to one. Figure 2. 10 shows a 
picture of this complex system. 
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Figure 2.6. From Gabrilovitcli (1939)j showing his Figure 2 
illustrating the rotating scanning heads. 




Figure 2. 7. From Schuller (1942), 
his rotating magnetic pickup heads. 



showing his Figure 1 illustrating 
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Figure 2. 8. From Gabor (1949b Figure 1 illustrating his lens drum 
used to scan motion picture film tracks. 




Figure 2.9. From Gabor (1950), Figure 10 and Figure 11 illustrating 
his apparatus for scanning a motion picture film track in sychrony with 
voice pitch periods. 
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Fig. 2. Construction of the exciting coils and the pick-up in a cross sectional view. 



Figure 2. 10. From Vilbig (1950), 
his string filter analog of the harmonic 



Figure 1 and 
compre s sor . 



Figure 2 illustrating 
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In the fall of 1952, Grant Fairbanks et al. (1959) applied for their patent 
on the compressor system* developed at the Speech Research Laboratories 
at the University of Illinois. This system uses the rotating head assembly 
shown in Figure 2. 3. 

Anton Springer (1961a, 1961b; 1.962a, 1962b, 1962c; 1963) filed a series 
of patents on improvements on the rotating heads and driving mechanisms 
starting in 1956. These have been incorporated into the Eltro Tempo 
Regulator manufactured in Germany. ** These machines have the advan- 
tage of a continuously adjustable compression rate up to 1 . 7 times normal 
wpm rate, but a disadvantage in terms of the long discard interval of 40 
milliseconds . 

Schimmel and Clay (1963) filed for add’ nal improvements on rotating 

heads. This was mainly an air suspen icx system to reduce nape and 

head wear. Gabor (1965) patented a mutt : head system with provision 
for synchronizing the sampling to the occurence of Ditch periods in the 
speech record being processed. 

Robert J. Wenzel (1962) working at Massachusetts Institute of Technology 
with John Dupress, developed a jitter action time compression device 
using the ignition timing cam from an ai rmobile as the basic driving de- 
vice. This did not work t o well due to mechanical vibration but may well 
deserve renewed effort as it would be an inexpensive system to produce. 

Jay Harold Ball (1961) developed the first known computer program for 
compressing speech. His work was followed by Scott (1965), H. L. 

Cramer and R. P. Talambiras (1970), and S. U. Qureshi and Y. J. 

Kingma (1970). 

There are reportedly two solid state systems under development using 
essentially a long taped delay line for slowing speech and thereby reducing 
frequencies below normal. 



*This device was first commercially available as the Vari-Vox machine, 
manufactured by Kay Electronics, Inc. in New Jersey. It is now com- 
mercially available in improved form from Discerned Sound, Inc. , North 
Hollywood, California. Samples of the speech produced on this type of 
machine are available from the Center for Rate-Controlled Recordings, 
University of Louisville, Louisville, Kentucky. This Center also has 
facilities for processing tapes at any specified amount of compression 
at a nominal fee. 

❖❖Available from Gotham Audio, New York, New York. 
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To summarize, in terms of available systems today, there are three 
different approaches. First, in terms of both discovery and amount of 
usage today, is the rotating head assembly system of Fairbanks et al. 
Secondly, we have several computer programs, somewhat costly and 
not generally available. Thirdly, we have the Harmonic Compressor 
developed by the American Foundation for the Blind and now available 
at the Perceptual Alternatives Laboratory at the University of Louis- 
ville, Louisville, Kentucky. 



CHAPTER III 



EFFECT OF RATE OF COMPRESSION AND MODE OF PRESENTATION 
ON THE COMPREHENSION OF A RECORDED COMMUNICATION 
TO JU: TOR COLLEGE STUDENTS OF VARYING APTITUDES 

Clement Ccrdell Parker* 



A problem cc imon to most educational institutions is to find be*. tech- 
niques to sen:: information across media with speed and reliability The 
problem is a gravated within junicr colleges because of the increased 
heterogenei of its student population. 

A number of studies have been made to determine the relationship of 
rate of presentation with degree of comprehension. Harwood (19 c ’5) dis- 
covered an insignificant loss as word rate was increased. Fairbanks, 
Guttman, and Miron (1957c) found little difference in the comprehension 
of messages presented at 141, 201, and 282 words per minute (wpm). 

The results of these and other studies seem to indicate that while there 
is a loss in comprehension with an increase in rate of presentation, the 
loss is insignificant up to about 280 wpm. 

Sticht (1968) trichotomized 135 Army inductees into three mental apti- 
tude categorie s - -low, medium, and high- - according to their Air Force 
Qualification Test scores. He found that increasing the speech rate had 
a greater disrupting effect on test performance of the higher aptitude 
subjects than those of low aptitude. 

Travers (1964) reports that he and Jester presented reading passages 
through hearing alone, vision alone, and both hearing and vision. They 
found that at the slower speeds no advantage was found for the audio- 
visual presentation, but at higher speeds the audiovisual channel proved 
to be superior. Loper (1966) measured comprehension and retention 
using two modes: aural and visually augmented aural where televised 



-"Mr. Parker is Chairman of the Department of Speech & Drama on the 
Northeast Campus of the Tarrant County Junior College District in 
Fort Worth, Texas. He is a candidate for the Doctor of Education degree 
at North Texas State University in Denton, Texas. 
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pictorials were used to supplement the aural message. He concluded 
that visual augmentation does not provi e much assis ar ;e to an aural 
p : ■ c. sentation. 

2 n'.iOr college students score lower on aptitude tests nan those students 
in four year colleges. The research (Cross, 1968) is national in scope, 
unanimous in findings, and is based on a staggering a ray of accepted 
measures of academic aptitude. 

This study was conducted with the hope that an efficie. method for pro- 
cessing information for junior college tudents could b discovered, 
thereby increasing their learning and success potential 



Statement of the Problem 

The problem of this study was to find a more efficien : way to store and 
transmit recorded information, thereby increasing the efficiency of pro- 
grammed learning centers and reducing the time requi. ed for utilization. 
More specifically, the problem was to determine the rare of compres- 
sion and mode of presentation having the most favorable impact on the 
comprehension of a recorded communication to junior college students 
of varying aptitudes. 

Subproblems included the following: (1) determination to what degree 

rate of compression could be increased without significant loss in com- 
prehension, (2) determination to what degree rate of comprehension 
could be increased with the simultaneous presentation of compressed 
speech and the printed page, and (3) determination of the effects of rate 
of compression and mode of presentation to students representing all 
levels of aptitude, low levels of aptitude, and high levels of aptitude. 



Definition of Terms 

1. Compressed speech - - oral, tape-recorded communication in which 
brief segments of the message have been deleted without significant 
distortion in vocal pitch or quality, (la) Zero compression , normal 
speaking rate; (lb) one-third compression , compressed speech re- 
quiring two-thirds of the original time for presentation; (lc) one -half 
compre ssion , compressed speech requiring half of original time for 
pre sentation. 

2. Audio-ocular - -the addition of the printed page to match an aural 
message in order to add the factor of sight to a factual presentation. 

3. Test of comprehension - -the correct number of responses to the com- 
prehension test within Form B of the I960 edition of the Nelson-De^ny 
Reading Test. 
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4. Test . f aptitude - -the correct number of responses to the Verbal 
Comprec- nsion section of the Guilford- Zimmerman Aptitude Survey. 
(4a) All-_evels group , included all students participating in experiment 
minus tb se laker, from the initial sample because of absence during 
one : _ th = tests cr failure to hear all of the selections; (4b) high-level 
group , *rose students who, within their treatment condition, scored at 
or 2 -bo 'e t e sixty- seventh percentile on test of aptitude; (4c) low-level 
group , tiiose students who, within their treatment condition, scored at 
or below the thirty-third percentile on test of aptitude. 



Procedure 

The eight selections within the test of comprehension were recorded by 
a professional speaker, and compressed to one-third and one-half de- 
grees. b - .- the Center for Rate- Controlled Recordings at the University 
of Louisville. Compression was achieved through the use of a Fairbanks 
type compressor. Instructions and a 2-minute practice selection were 
programmed into the tapes. 

Subjects were 429 students enrolled in the Freshman composition classes 
during the fall semester of the 1969-70 academic year on the Northeast 
Campus of the Tarrant County Junior College District in Fort Worth, 
Texas. Eighteen of the 22 available day sections were selected at ran- 
dom, and a table of random numbers utilized to populate the six experi- 
mental groups with three sections in each group (about 75 students for 
each of the six experimental groups). 

The test of comprehension was administered during the first week of 
classes in the Language Laboratory within the Programmed Learning 
Center. Students were free to select any one of the 30 available car- 
rels, and each carrel was equipped with padded earphones which could 
be adjusted for comfortable listening. Each of the output units was 
locked into the channel selected for the experiment. A copy of the test 
of comprehension was available to all audio- ocular groups, and included 
the printed copy of each of the recorded messages. The aural-only 
groups received only the test questions. Each carrel was supplied with 
pencil, answer sheet, and short questionnaire. 

When all Ss were seated, they were asked to place their earphones on 
their heads. Programmed instructions began immediately thereafter 
with an admonition to adjust the earphones for comfortable listening and 
to confirm ability to hear. Students then heard the 2-minute introdv^torv 
message and the eight selections of the test of comprehension. Students 
were allowed 15 seconds per question to answer each of the 36 multiple- 
choice items. When + he last test was finished, students were asked to 
fUl out a brief questionnaire and were thanked for their participation. 
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.ubjects were also given the test of aptitude during the first week of 
classes. All of the tests were administered by the same person in com- 
parable classrooms. All tests were hand-scored and the results re- 
corded on keypunch worksheets. 



Treatment of Data 

Three 3x2 classifications of data were created. The 3x2 schema 
represented two modes of presentation (aural-only and audio- ocular ) 
and three degrees of compression (zero, one-third, and one-half). The 
first 3x2 classification represented the all-levels group, the second 
:he high-level group, and the third the low-level group. Two-way ana- 
lysis of variance yielded the following results: 



TABLE 3. 1 

TWO-WAY ANALYSIS OF VARIANCE FOR TEST OF 
COMPREHENSION ALL- LEVELS GROUP 





Source of Variation 


SS 


df 


MS 


F 


Mode of Presentation 


1, 428. 35 


1 


1, 428. 35 


54. 73* 


Rate of Compression 


1, 101. 43 


2 


550. 71 


21. 10* 


Interaction 


349. 97 


2 


174. 99 


6. 70* 


Within 


11, 040. 34 


423 


26. 10 





*p < 0. 05 



TABLE 3. 2 

TWO-WAY ANALYSIS OF VARIANCE FOR TEST OF 
COMPREHENSION HIGH-LEVEL GROUP 





Source of Variation 


SS 


df 


MS 


F 


Mode of Presentation 


671. 79 


1 


671. 79 


35. 16* 


Rate of Compression 


291. 48 


2 


145. 74 


7. 63* 


Interaction 


253. 63 


2 


126. 81 


6. 64* 


Within 


2, 617. 42 


137 


19. 10 





*p <0.05 





TABLE 3. 3 



TWO-WAY ANALYSIS OF VARIANCE FOR TEST OF 
COMPREHENSION LOW-LEVEL CROUP 





Source of Variation 


SS 


df ~ 


MS 


F 


Mode of Presentation 


204. 43 


1 


204. 43 


10. 31* 


Rate of Compression 


424. 72 


2 


212. 36 


10. 71* 


Interaction 


43. 67 


2 


21. 83 


1. 10 


Within 


2, 716. 85 


137 


19. 83 





*p < 0. 05 



Since the results from the analysis of variance permitted rejection of all 
null hypotheses of no difference due to rate of compression or mode of 
presentation at different aptitude levels, _t tests were run for comparison 
of certain means with the following results: 



TABLE 3.4 

MEAN COMPREHENSION SCORES ALL-LEVELS 



Zero Compression 



AURAL- ONLY AUDIO- OCULAR 
19.66 21.35 | 



One-third Compression 



One-half Compression 



— i 



h 


18. 13 


21. 36 1 


' 1 
1 




1 1 . 


1 Q Q 1 


1 












■*»» 



Key: 



No significant difference between means 
Significant difference at . 05 level or better. 
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TABLE 3. 5 

MEAN COMPREHENSION SCORES HIGH-LEVEL 

AURAL- ONLY AUDIO- OCULAR 



Zero Compression 


k 

1 


22. 88 


24. 63 | 

_ 1 


One-third Compression 


1 




■■ i 

1 

^ O O 7C 1 


T , 


k 




One-half Compression 








f ID, C,*± 


^ . 6 0 



TABLE 3. 6 

MEAN COMPREHENSION SCORES LOW-LEVEL 





AURAL- ONLY 


AUDIO- OCULAR 


Zero Compression 


A 

1 

| 




*» OA 


__ w *] O A/1 


A 

1 

. ■ 






I / , 6U ^ 


io, 




One-third Compression 


i 

1 






^ 1 A AO 


i 


T i 




io, / O ^ 




T i 


k 


One-half Compression 








^ ic on 






T 


1 u v J J ^ 






f 



Key: 



No significant difference between means . 
Significant difference at . 05 level or better. 



TABLE 3. 7 

t TESTS FOR COMPREHENSION SCORES ALL-LEVELS GROUP 



Run 


Mean 


N 


Mean 


N 


df 


t 


1 


19. 66 


76 


21. 35 


80 


154 


-2. 07* 


2 


18. 13 


68 


21. 36 


73 


139 


-3. 74* 


3 


13. 75 


63 


19. 81 


69 


130 


-6. 81* 


4 


19. 66 


76 


18. 13 


68 


144 


1. 79 


5 


18. 13 


68 


13. 75 


63 


129 


4. 91* 


6 


21. 35 


80 


21. 36 


73 


151 


- .01 


7 


21. 36 


73 


19. 81 


69 


140 


1. 80 



*p < 0. 05 
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TABLE 3. 8 

t TESTS FOR COMPREHENSION SCORES HIGH-LEVEL GROUP 



Run 


Mean 


N 


Mean 


N 


df 


t 


1 


22. 88 


25 


24. 63 


27 


50 


-1. 44 


2 


20.48 


23 


23. 75 


24 


45 


-2. 56* 


3 


16. 24 


21 


24. 26 


23 


42 


-6. 08* 


4 


22. 88 


25 


20.48 


23 


46 


1. 90 


5 


20.48 


23 


16. 24 


21 


42 


3. 21* 


6 


24. 63 


27 


23. 75 


24 


49 


. 72 


7 


23. 75 


24 


24. 26 


23 


45 


- .40 



*p < 0. 05 



TABLE 3. 9 

t TESTS FOR COMPREHENSION SCORES LOW-LEVEL GROUP 



Run 


Mean 


N 


Mean 


N 


df 


t 


1 


17. 20 


25 


18. 04 


27 


50 


- .68 


2 


15. 78 


23 


19. 08 


24 


45 


-2. 54* 


3 


12. 33 


21 


15. 39 


23 


42 


-2. 27* 


4 


17. 20 


25 


15. 78 


23 


46 


1. 10 


5 


15. 78 


23 


12. 33 


21 


42 


2. 57* 


6 


18. 04 


27 


19. 08 


24 


49 


- . 83 


7 


19. 08 


24 


15. 39 


23 


45 


2. 84* 



*p < 0. 05 



Discus sion 

The simultaneous presentation of the printed page to match an aural pre- 
sentation resulted in significantly better comprehension for all aptitude 
levels hearing compressed speech. It was not, however, superior for 
the high and low aptitude level groups hearing normal rate recordings. 
Hence, it may be concluded that the printed page provides assistance in 
comprehension when the speaking rate is increased above the normal 
rate . 

None of the aptitude levels experienced significant losses in comprehen- 
sion when messages were speeded to one-third compression. This illus- 
trates the suitability and efficiency of compressed speech for a junior 
college population. Furthermore, except in low-aptitude groups, the 
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speed may be increased to one-half compression without significant loss 
in comprehension, provided the printed page is supplied to match the 
aural message. Comprehension was significantly decreased when the 
aural-only messages were speeded to one-half compression. A speed 
of one-half compression may be too great to result in acceptable com- 
prehension for a.ural-only groups. 
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CHAPTER IV 



PERTURBATIONS OF SEX JUDGMENTS WITH TIME- COMPRESSED 
AND FREQUENCY-DIVIDED SPEECH SIGNALS 
Daniel S. Beasley and Willard R. Zemlin* 



If time-compressed and frequency- divided speech is to be used in educa- 
tional and clinical settings, the equivocal results of several studies of 
subjective perceptual interpretation of the processed speech signal should 
be investigated. 



Time -Compressed Speech 



Daniloff, Shriner, and Zemlin (1968a) observed female speakers to be 
rated as more intelligible than male speakers when they spoke eight vow- 
els in an h-d context which were time-compressed using the Fairbanks 
sampling method. However, Zemlin, Daniloff, and Shriner (1968) also 
showed that listeners rated female time -compressed speech as more dif- 
iiCux u uO 1 1 s uO tiian rxiaxt? u ims — c o ixip i~ 0 s Scu spccCu, in a.ciiu .1 tion 9 ux£ 



same judges preferred 30% time- compressed speech over 40% and 50%, 
although the Daniloff et al. (1968a) study showed that intelligibility was 
high up to compression rates of 70%. It appears phonemic quality, as 
reflected by vowel intelligibility, may remain more stable at higher com- 
pression ratios than phonetic quality. Foulke (1966c) distributed record- 
ings of time- compres sed speech and questionnaires to blind Ss in several 
geographical areas. Although the majority of the respondents found the 
female easier to understand than the male (55% versus 45%), a larger 
majority preferred to listen to the male (65% versus 32%). These results 
suggest that speaker preference criteria of auditory O s may play an 
equal if not greater role in the utilitarian consideration of such speech. 
Evidence has been provided that phonemic quality is based on a relative 
vowel hypothesis (Daniloff et al. , 1968a; Potter & Steinberg, 1950), 



*Dr. Daniel S. Beasley is an Assistant Professor in the Department of 
Audiology and Speech Sciences at Michigan State University, East Lansing, 
Michigan 48823. Dr. Willard R. Zemlin is Director of the Speech and 
Hearing Research Laboratory at the University of Illinois, Champaign, 
Illinois 61820. 
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whereas phonetic quality may be based on a modified fixed vowel hy- 
pothesis as suggested by Slawson (1968). Phonemic quality of female 
speech would be maintained longer than male speech due to the inherent 
redundancy of female speech, but phonetic quality would decline earlier 
for female speech because more of the characteristic pitch periods 
(determining fundamental frequency) of the female, contra the male, 
are discarded in the sampling technique. Listener preference may be 
partially determined by this phonetic quality. Good phonemic quality 
may not overcome listener's dislike of listening to the material in a pro- 
longed listening task. It is then necessary to study preference values of 
the listeners using time- compres sed female speech in order to establish 
possible reasons the male is preferred over the female. Such knowledge 
would perhaps lead to methods of overcoming these attitudes, thereby 
permitting, in the educational process, full advantage to be taken of the 
high intelligibility of female time-compressed speech. 



Frequency- Divided and Frequency- Divided 
Time- Restored Speech 

Daniloff et al. (1968a), in their vowel study, showed female frequency- 
divided and frequency- divided time- restored speech had better phonemic 
quality than male speech, as did Klumpp and Webster (1 9 6 1 ) using a slow 
playback, frequency- divided method. However, neither looked at prefer- 
ence values for frequency- divided and frequency-divided time- restored 
speech Bennett and Byers (1967) investigated the use of frequency- divided 
speech, using a slow playback method, on a geriatric population. Their 
Ss preferred the male speech. Thus, sex of the speakex* may yield dif- 
ferential results for phonemic and phonetic quality in studies involving 
frequency-divided speech. Based on the relative vowel hypothesis, pho- 
nemic quality of female speech may remain higher than the male's, since 
the female ' s lower formants, especially F2 (Thomas, 1968), unlike the 
male's, are not shifted out of the normal experiential bandpass under 
frequency- divided and frequency- divided time- restored conditions (Daniloff 
et al. , 1968a; Tiffany & Bennett, 1961). But the formant shifting does 
effect phonetic quality, which is based on fixed values. In a prolonged 
listening task, phonetic quality must be considered. The reason for the 
above conflicting results may be that the more intelligible frequency- 
divided and frequency- divided time- re stored female speech, when shifted 
toward the frequency domain of the male, begins to sound effeminate, a 
cultural taboo in our society, or at least it used to be, and members of 
the society, as listeners, may not prefer to listen to it. 

The purpose of this study is to investigate the ratings of masculine- 
feminine continuum poles of a male and female speaker whose speech has 
been time-compressed, frequency- divided, and frequency- divided time- 
restored. The masculine -feminine data will be compared to values 
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obtained on other scales in similar studies. 



Method of Investigation 



Experimental Materials 

In order to adequately compare the phonemic analysis of Daniloff e J . al. 
(1968a) to the phonetic analysis of this study, the stimuli consisted of 
11 h-d context embedded vowels, spoken by a male (fo = 104 Hz) and a 
female (fo =198 Hz) at conversational pitch and effort level. The vowels 
were processed through five conditions (20% through 60 % in 10% steps) 
of time-compressed and frequency-divided and frequency-divided time- 
restored speech. Thus, there were 32 experimental sets of vowels: 2 

normal (male and female); 10 time- compres sed (5 males, 5 females); 10 
frequency-divided (5 males, 5 females); and 10 frequency-divided time- 
restored (5 males, 5 females). 

The 32 sets of vowels were randomized. All Oa heard the same ran- 
domized experimental tape. Approximately 2 seconds of silent interval 
was provided between items in each set. Each set of words took about 
25 seconds playback time. 



Subjects 

Listeners consisted of 14 male and female college students in a controlled 
listening environment. 



Experimental Procedures 

Semantic differential type scales (Osgood, Suci, & Tannebaum, 1957) 
were used to assess phonetic quality. These attempt to elicit behavior 
to alternatives which are representative of the various meanings over 
which a concept (in this case, speech sample) may vary on a 7-point 
scale of polar opposites to indicate direction and intensity of response. 
Seven such semantic differential scales, chosen according to the Osgood 
et al. (195 7) criteria of relevance (of the scales to the concepts being 
judged) and linearity of polar opposites (e. g. , rugged-delicate may both 
be favorable under certain circumstances), were used to elicit qualita- 
tive judgment of the 32 sets of speech signals from the listeners. These 
seven scales were: Fast-Slow, High-Low, Masculine- Feminine, Like- 

Dislike, Harmonious -Dissonant, Loud- Soft, Pleasant- Unpleasant. 

The jO's task was to rate each of the 32 sets of each of the seven scales. 
Observer heard a set, then was allotted 1 minute to respond to the 11 
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32 



items in the set. A 1 minute response interval was used to allow the O 
adequate response time on more difficult sets. Further, the long inter- 
val aided in the forgetting of prior sets, thus minimizing the tendency of 
C) to compare subsequent sets to prior sets. Prior to the beginning of a 
set, three 1 kHz beeps were sounded as a warning to "get ready. " A 
single beep sounded at the end of a set indicating the 1 minute rating per- 
iod had begun. 

The response sheet consisted of three scale-position randomizations 
(Rl, R2, and R3). These three randomized sheets were randomly dis- 
tributed in booklets of 41 each for each O. Finally, the poles on the con- 
tinua for Rl, R2, and R3 were randomly positioned, so that one end (left 
or right) of the continua was not always positive and/ or negative. 

All jOs received standardized instructions (see Appendix A). 

Phase II of the study was similar to Phase I, except an Intelligible- 
Unintelligible scale was added to the rating sheets. A different male and 
female speaker was used, thus bringing the total number of speakers to 
four: two males and two females. Also, Phase II eliminated ratings of 
time -compressed speech. Finally, 15 different listeners were used in 
Phase II. 



Re suits --Phase I 



Reliability of Ratings 



An intraclass correlation coefficient (McNemar, 1962) for the masculine- 
feminine continuum was computed for the total group, 
to be . 99. 



The r was found 
tt 



M Values of Ratings by Conditions 

Table 4. 1 lists the M scale values by condition, by set, and by sex of 
speaker. Figures 4. 1, 4. 2 and 4. 3 illustrate the values of Table 4. 1 
graphically. 

As can be seen from Figure 4. 1, the male and female speakers are con- 
sistently (r = . 99) rated as per their respective sex under time-com- 
pressed speech. The high r suggests that the variations in the M ratings 
by sets for time-compressed speech are systematic. There appears to 
be a trend toward middle scale values for both speakers, the male show- 
ing the trend sooner, but the female showing the trend more consistently, 
especially at higher time-compressed speech ratios. 
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Figure 4. 1. Graphic representation of listeners M scaled values 
of ratings of male and female TC vowels. 
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Figure 4. 2. Graphic representation of listeners M scaled values 
of ratings of male and female FD-TR vowels. 
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Figure 4. 3. Graphic representation of listeners M scaled values 
of ratings of male and female FD vowels. 




TABLE 4. 1 



M VALUES OF SCALED MASCULINITY- FEMININITY OF LISTENER'S 
RESPONSES TO MALE AND FEMALE TIME- COMPRESSED (TC), 
FREQUENCY-DIVIDED (FD), AND FREQUENCY -DIVIDED 
TIME-RESTORED (FD-TR) VOWEL IN H-D CONTEXT 

FOR PHASE I 









TC 


FD 


FD-TR 


Male Female 


Male Female 


Male Female 



0% 


1.0 


6.4 


1.0 


6.4 


1 . 0 


6.4 


20% 


1. 7 


6.4 


1.7 


4. 2 


1. 5 


3. 0 


30% 


1.5 


6. 5 


1.3 


3. 0 


1.5 


2. 9 


40% 


1. 9 


6. 0 


1.4 


2. 6 


1 . 1 


1.5 


50% 


1. 7 


6. 1 


1.3 


1. 5 


1. 2 


1.3 


60% 


2. 2 


6. 2 


1.3 


1.5 


1. 3 


1.7 


70% 


1.5 


5. 7 


— 


— 


— 


— 


80% 


2. 0 


5. 8 


— 


— 


— 


— 



The frequency-divided and frequency-divided time- restored conditions 
(Figures 4. 2 and 4. 3 respectively) show more profound experimental ef- 
fects. From 20% on, under both conditions, the female appears to sound 
masculine. This initial effect is greater under the frequency- divided 
time -restored than frequency-divided condition. The frequency-divided 
time- restored curve is also steeper than the frequency-divided curve. 
Further, the frequency- divided time-restored maximum masculine rating 
for the female speaker is attained at 40%, whereas the frequency- divided 
maximum for the female is not attained until 50%. Finally, the frequency- 
divided condition maximum masculine rating for the female appears more 
stable than the frequency- divided time- restored maximum masculine rat- 
ing for the female speaker. 



Results --Phas II 

The tentative results of this study suggest that a female speaker may not 
be preferred under conditions of frequency-divided and frequency- divided 
time - restored speech because of an effiminate perceptual quality after 
her speech has been processed. 



Reliability of Results 

Analyses of two scales were performed under Phase II: Masculine- 
Feminine, Intelligible -Unintelligible. Reliability coefficients computed 
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for these data revealed an r^_ = . 98 and r ^ = . 87 respectively. Using 
the Silverman Estimation Method (Silverman, 1968), it was found that 



an additional five listeners would be required to raise the r to 
the Intelligible- Unintelligible scale. 



90 for 



M Values of Ratings by Conditions 

As expected, similar findings were obtained on the Masculine- Feminine 
scale in Phase II as were obtained in Phase I. One difference was that 
the maximum masculine rating for the female for frequency- divided and 
frequency- divided time- restored speech in Phase II was not reached until 
60%. Table 4. 2 and Figures 4.4 and 4. 5 depict this information. 



TABLE 4. 2 



M VALUES OF SCALED MASCULINITY- FEMININITY OF LISTENER'S 
RESPONSES TO MALE AND FEMALE FREQUENCY-DIVIDED (FD), 
AND FREQUENCY-DIVIDED TIME-RESTORED 
(FD-TR) VOWEL IN AN H-D CONTEXT 
FOR PHASE II 





FD 


FD-TR 


Tvyfolo. TT prn 

d.T4.VVj|. N/ -1_ U 


Male Female 



0% 


1.40 


6. 5 


1.4 


6. 5 


20% 


1. 20 


3. 9 


1.4 


3. 5 


30% 


1. 20 


3. 1 


1. 2 


3. 3 


40% 


1. 26 


1. 8 


1. 1 


2. 6 


50% 


1. 26 


1.8 


1. 1 


1. 7 


60% 


1. 20 


1. 5 


1. 3 


1. 3 



Regarding the Intelligible- Unintelligible scale values, the frequency- 
divided and frequency- divided time- restored conditions were both rated 
highly intelligible through the 20% condition. For both conditions the first 
major drop in intelligibility occurs at 30% for both sexes, the male speaker 
showing a steeper slope than the female. The data reveals the male 
speaker to be rated less intelligible than the female through the remain- 
ing compression levels for both conditions. The frequency- divided time- 
restored condition shows a more rapid decline in rated intelligibility than 
does the frequency- divided condition. For the frequency- divided time- 
restored condition the most dramatic drop occurs at 40% for the male, 
at 50% for the female. Although the frequency- divided condition reveals 
a more systematic decline in intelligibility, the frequency- divided 
time- restored condition appears to stabilize at higher compression 
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Figure 4. 4. Graphic representation of listeners M scaled values 
of ratings of male and female FD vowels. 
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Figure 4.5. Graphic representation of listeners M scaled value 
of ratings of male and female FD-TR vowels. 





condition (beyond 50% for both sexes). This data is summarized in Table 
4. 3 and Figures 4. 6 and 4. 7. 



TABLE 4. 3 



M VALUES OF SCALED INTELLIGIBLE- o _ VTELLIGIBLE OF 
LISTENER'S RESPONSES TO MALE AND FEMALE 
FREQUENCY-DIVIDED (FD) AND FREQUENCY- 
DIVIDED TIME- RESTORED (FD-TR) VOWELS 
IN AN H-D CONTEXT FOR PHASE II 





FD 




FD-TR 


Male 


Female 


Male Female 



0% 


6. 3 


6. 2 


6. 3 


6. 2 


20% 


6. 7 


6. 2 


6. 1 


6.4 


30% 


4. 0 


5. 5 


4.5 


5.4 


40% 


3. 1 


4.4 


2. 7 


4. 8 


50% 


1.9 


3. 5 


1.8 


2.4 


60% 


1. 3 


2.5 


2. 3 


2.4 



Discus sion 



Time Compression 

From the results it can be concluded that speaker sex identification under 
even extreme conditions of time-compressed speech tends to remain stable. 
The graphic depiction of the time-compressed ratings also tends to vary 
about the same for both sexes. Zemlin et al. (1968) concluded that intel- 
ligibility was not equivalent to preference, that is, what may be most 
intelligible may not necessarily be what is preferred. It was felt that a 
reason for this might be related to speaker sex identification under var- 
ious conditions of time-compressed speech. The question is still to be 
resolved as to the essential differences between the Foulke (1966c) find- 
ings and those of Daniloff et al. (1968a). Further analysis of several 
of the other semantic differential scales used in this study is currently 
underway. 



Frequency-Divided and Frequency-Divided Time-Restored 

The results agree with Daniloff et al. (1968a) and Klumpp and Webster 
(1961) in that the female frequency- divided and frequency-divided time- 
restored speech is more intelligible than the male frequency-divided and 
frequency- divided time- restored speech. Further agreement with the 
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Figure 4. 6. Graphic representation of listeners M scaled values 
of ratings of male and female FD vowels. 
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Figure 4. 7. Graphic representation of listeners M scaled values 
of ratings of male and female FD-TR vowels. 
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Daniloff et al. (1968a) study is seen in that both studies reveal the first 
major decline in intelligibility to be about 30% distortion. Also, the 
frequency-divided time - re stored speech revealed a rapid initial decline 
in both studies, especially for the male speaker. Finally, both studies 
reveal the most dramatic drop for the male frequency-divideu time- 
restored speech to be at 40%, for the female at 50%. 

The agreement between these studies relative to the Intelligible - Unintel- 
ligible scales suggests that a listener is able to judge adequately what is 
intelligible to him, and that this judgment would be highly correlated to 
what would be revealed by traditional intelligibility tests. 

The conflicting results between the Daniloff et al. (1968a) study and 
this study, that the female frequency-divided and frequc’ divided time- 
restored speech is more intelligible, and the Tiffany ana ■ xnett (1961) 
study, which showed a male preference, can be explained by the results 
of this study. Apparently the female distorted speech begins to take on a 
psychological male -like component, whereas the male speaker, as ex- 
pected, tends to remain stable. There is no social decision to be made 
with respect to his distorted speech. These findings would support the 
contention that phonetic quality and phonemic quality are not equivalent. 
Further, phonetic quality may be based on a fixed vowel hypothesis, 
whereas phonemic quality may be related to a relative vowel hypothesis. 

Further analyses of this data are being carried out. Further, physical 
measurements, such as those performed by Terango (1966), are being 
performed on all four speakers in order to physically account for the 
gradual shift of the female to male frequency-divided and frequency- 
divided time- re stored speech. It is suspected that the female frequency- 
divided and frequency-divided time- restored speech will reveal that the 
M rate of pitch change during inflection will decrease with increased dis- 
tortion, as revealed by Terango (1966) when he studied rated effeminate 
voices . 

Finally, the results of the Like-Dislike scale should shed substantial 
light upon the preference / intelligibility controversy. 

There appears little doubt that if time-compressed, frequency-divided, 
and frequency-divided time- restored processed speech is to be used edu- 
cationally, consideration must be given to more than simply intelligibility 
What an individual likes (prefers) to listen to may have significant bear- 
ing on his progress. 
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APPENDIX A 



INSTRUCTIONS FOR SCALING STUDY ON THE 
PERCEPTION OF DISTORTED SPEECH 



The purpose of this study is to study the feelings of people to various 
types of speech. We hope to do this by having the people judge the speech 
they hear against a series of descriptive scales., In taking this test, 
please make the judgment on the basis of how you feel about the speech 
signals you are to judge. On the dittoed sheet you will find seven differ- 
ent scales. I will play a recorded tape. You will hear vowels in an h-d 
context. There are 41 sets of 1 1 vowels per set. Between each set of 
11 vowels there is a silence of about 1 minute. During this silence fol- 
lowing each set, you are to rate the set on the seven scales, in order. 

Here is how you are to use the scale: 

If you feel that the set of words you heard is very closely related to one 
end of the scale, you should place your checkmark as follows: 

fair X : : : : : : : unfair 



OR 



fair : : : : : : X : unfair 



If you feel that the set of words in quite closely relate d to one or the other 
end of the scale (but not extremely), you should place your checkmark as 
follows : 

fair : X : : : : : : unfair 



OR 



fair : : : : : X : : unfair 



If the set of w jrds seems only slightly related to one side or the other side 
(but not really neutral), then you should check as follows: 
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fair : : X : : : : : unfair 



OR 



fair : : : : X : : : unfair 



The direction toward which you check, of course, depends on which of the 
two ends of the scale seem most characteristic of the set of words you 
are judging. 

If you consider the set to be neutral on the scale, both sides of the scale 
equally associated with the set you are judging, then you should place your 
checkmark in the middle space: 

fair : : : X : : : : unfair 

IMPORTANT: 

(1) Place your checkmarks in the middle of the spaces , 
not on the boundaries: 

. : X : : : : X: 

(this) (not this) 

(2) Be sure to check every scale for every set of words-- 
do not omit any . 

(3) Never put more than one checkmark on a single scale. 

(4) Remember, there’s only about a minute between sets, 
so work accurately but rapidly. 

Sometimes you may feel as though you've had the same set of words before 
on the test. This will not be the case, so do not look back at previous rat- 
ing sheets. Do not try to remember how you checked previous items on 
each scale: make each item a separate and independent judgment. Do not 
worry or puzzle over individual items. It is your first irr oression, the 
immediate feeling about the sets of words we want. On the other hand, 
do not be careless because we want your true impressions. 

Are there any quc stions ? 

Do not begin your ratings until the set ends. 

Three beeps mean get ready, one beep means the end of the set. 

This is not a test of intelligibility. 





CHAPTER V 



DICHOTIC SPEECH- TIME COMPRESSION 
Sanford E. Gerber and Robert J. Scott* 



In general, the Fairbanks procedure (Figure 5. 1) or its German equiva- 
lent has been the method of choice for various applications . The main 
difficulty with time compressing speech in thm .iy is that it depends for 
its compression upon the discarding of information. If the intelligibility 
is less than that achieved uncompressed, it is probably due to the loss of 
information. Scott (1965, 1967b), making this observation, hypothesized 
that restoring the information should restore the intelligibility. We have 
now completed a series of studies to verify this hypothesis. 



Dichotic Compression 

Scott's (1965 ) procedure is called "dichotic" speech-time compression. 
Recall that in the Fairbanks procedure the signal (and hence, the infor- 
mation) in the discard interval is not recoverable, so could not be made 
available to the listener. The differences between "diotic" speech-time 
compression (i.e. , Fairbanks' method) and "dichotic" speech-time 
compression (i.e. , Scott's method) are shown in Figure 5. 2. The di- 
otically produced tape has one track which contains only the (imaginary) 
odd-numbered segments, and these continuous segments are heard in 
both ears. The dichotically produced tape has two tracks: one track is 

identical to the diotic tape and is played to one ear only; the second 
track contains only the (imaginary) even-numbered segments, and this 
track is played only to the other ear. Notice that the first track is de- 
layed a bit so that it is offset in time with respect to the second track. 
The second time segment no longer follows the first, but overlaps it in 
time. The significance of the amount of overlap remains to be investi- 
gated, but for all these experiments it has bee i 50% with respect to 
either segmen.. 



*Dr . Sanford E. Gerber is Assistant Professor of Speech and Disorder 
of the Audiology Laboratory at the Speech and Hearing Center of the 
University of California, Santa Barbara, California. Dr. Robert J. 
Scott is a consultant with the U.S. Government, Washington, D.C. 
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Figure 5. 1. Fairbanks' rotating head scheme. 
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Figure 5. 2. 2:1 compression. 
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To create dichotic speech-time compression a hybrid computer system 
has been used (Gerber, in press; Hogan & Scott, 1963). For the re- 
search reported here, two different (but very similar) hybrid computers 
have been employed. Figure 5. 3 shows the hybrid system used for the 
experiments up to the last one, and Figure 5.4 shows the system used in 
the latest experiment. The input / output apparatus is essentially the 
same in both cases. The older system uses a PDP-1 digital computer, 
while the newer one employs a PDP-7. The PDP-1 is a somewhat 
larger but slower machine than the PDP-7; both are manufactured by 
the Digital Equipment Corporation of Maynard, Massachusetts. The 
analog portion of the hybrid is a Pace TR-10 analog computer associated 
with the PDP-1 or an EAI 8800 in the case of the PDP-7. The analog 
computers were made by Electronic Associates, Inc. of Long Branch, 
New Jersey. The writers have been very fortunate to have had these hy- 
brid computer systems wailable for this research. 



Experiment I 

Our first investigation of the intelligibility of dichotic speech-time com- 
pression dealt with the differences between dichotic and diotic for each 
of three compression ratios. The results of that study have been re- 
ported (Gerber, 1968) and need only be summarized here. 

The stimuli used in all the intelligibility experiments were Fairbanks' 
recordings of the rhyme test words (cf. Fairbanks, 1958). The record- 
ings of the rhyme tests were input from the tape playback via the analog 
computer interface to the analog- to- digital converter which put the 
digitized speech onto magnetic tape. Then, under operator control, the 
computer time compressed the digitizer speech and wrote this version 
onto another magnetic tape. When the compression process had been 
completed, the compressed digital tape was output via the digital-to- 
analog converter onto audio tape. In this way all 250 items of the 
Fairbanks recordings were compressed and dichotomized. 

In the first experiment, we used three different compression ratios 
(R - 2:1, 3:2, and 4:3) and rhree different discard intervals (I = 30, 

40, and 50 milliseconds [msec. ]). Twenty listeners were employed. 
For all listeners, +he dichotic signals were more intelligible than their 
diotic coun rts. Combining across both compression ratio and 

discard ini yal, it was found that dichotic listening led to higher intel- 
ligibility scores than diotic listening. For this aggregate of dichotic 
signals, the average intelligibility exceeded 97%; while, for the aggre- 
gate of diotic signals, the average intelligibility was just ever 93%. 

This difference was significant beyond the 0. 01 level. This means that 
the dichotic version did, indeed, restore the intelligibility; and presum- 
ably via the restoration of the otherwise discarded information. 
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Figure 5.3. Hybrid computer system number one. 
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Figure 5.4. Hybrid computer system number two 
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There were some other interesting findings from this first experiment. 
We found no significant differences among the three compression ratios 
when discard interval was not coiisidered. Therefore, we could con- 
clude that the dichotic restoration of information was good to at least 
double normal speed. Moreover, virtually no intelligibility was to be 
gained by minimizing the amount of compression, for example, to only 
4:3 or 3:2, 

We did find differences with respect to discard interval. The discard 
interval of 50 msec, was significantly (at the 0. 10 level) less intelligible 
than the others, which did not differ significantly from each other. It 
seems, then, that a discard of 50 msec, is in some way "too long, " 

When the information was restored, that is when dichotic was compared 
with diotic, it was seen that the restoration was significant for the 50 
msec, discard interval. The difference between dichotic and diotic for 
this discard interval was nearly 9%. Therefore, given a diotic signal, 

50 msec, of information is too much to miss at one time. 

Although many early experiments support the use of a discard interval 
of about 40 msec, for all compression ratios, we feel that for isolated 
words intelligibility can be significantly increased by using a discard in- 
terval as short as 15 to 20 msec, depending on the amount of compression 
and the average fundamental frequency of the speaker. For continuous 
speech, shorter discard intervals have been avoided primarily because 
of the annoying effect of the interruption frequency. As those of us work- 
ing with computer-compressed speech have experienced, the use of sam- 
pling intervals which do not preserve at least one complete and continuous 
voicing period injects an artificial monotone pitch, the frequency of 
which is inversely proportional to the sampling period. 

In general, we concluded from this first experiment that time- compressed 
speech (up to double speed) is highly intelligible anyway, and restoration 
of the information by Scott's dichotic technique is not only feasible but 
desirable. His scheme of dichotic speech-time compression restores 
the intelligibility of time -compressed speech essentially to its uncom- 
pressed level. 

Experiment II 

The results of Experiment I were very encouraging, but left some ques- 
tions unanswered. Speech compressed by means of discarding segments 
leads to more listening possibilities than were investigated in Experiment 
I, Reference to Figure 5, 5 will show that there are four listening pos- 
sibilities. What we had called "diotic" referred to the fact that the sig- 
nal was heard with both ears when there was only one signal. To describe 
this, in the second experiment we decided to label this condition "tJnitary- 
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Figure 5.5. Lis tening p o s s ibil it i e s 
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Diotic, " meaning one signal in two ears. The other possibilities using 
this scheme are: Dichotic- Diotic, Dichotic- Monotic, and Dichotic- 
Dichotic, In these terms, we had looked in the first experiment only 
at Unitary- Diotic (one signal in both ears) and Dichotic- Dichotic (two 
signals, one in each ear). We were, therefore, unable to tell whence 
came the apparent improvement: from the dichotomizing of the signal, 
or from the dichotomizing of the listener. 

The purpose of Experiment II was to determine the necessity of listen- 
ing dichotically to dichotic signals. Perhaps one ear could process 
dichotic time-compressed speech as well as two since all the informa- 
tion would have been restored in this case, too. The results of this 
investigation are also in the literature (Gerber, 1969), It was found 
that dichotic signals heard diotically were not more intelligible than 
when heard monotically; there was no significant difference between 
monotic and diotic listening conditions when the signal was dichotic. 
Moreover, no preference for ear was observed in the monotic condition. 
Using a dichotic signal it seems sure (and not at all surprising) that 
intelligibility would be superior if the listening condition were also di- 
chotomized. That is to say, the highest intelligibility results when the 
dichotic signal is presented one track to each ear. If both tracks are 
presented to one or to both ears, intelligibility suffers. Furthermore, 
if only one track is used (in one or both ears) intelligibility suffers. 

Experiment II, like Experiment I, revealed a significant difference 
between 40 and 50 msec, discard intervals. Again, intelligibility with 
discards of 40 msec, was greater than with discards of 50 msec, with 
the compression ratio fixed at 2:1. The data of Experiment I caused 
us to decide that ratios less than 2:1 were no longer interesting. 

These two experiments led us to raise a question which has been asked 
many times over the years that this process has been investigated. 

Why is time -compressed speech less intelligible than uncompressed? 

Is the loss of intelligibility due solely to the loss of information? Or 
is it due to excessive rate demands upon the listener? Or both? 



Experiment III 

It seemed reasonably clear after the first two experiments that the loss 
of intelligibility must be due to the loss of information and not due to 
the speed being too demanding for processing. The dichotic signal, 
wherein all the information is present, was always more intelligible 
than the diotic signal at the same rate. However, what happens when 
the speed is more than doubled? If the compression ratio is greater 
than 2:1, it is not possible to recover all the discarded information 
(since we have only two ears), but it is possible to recover some of it. 
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If the intelligibility of time-compressed speech at greater than double 
the normal speed is enhanced by dichotomizing, then the loss of intel- 
ligibility must be attributable to the: loss of information. Furthermore, 
it is possible to restore the time without restoring the information. If 
this time -restored version is not more intelligible than the compressed 
version, then one must conclude that the time demands are not excessive. 
Anyway, since the dichotic speech-time compression at rates up to double 
the original was shown to suffer no important loss of intelligibility, one 
wants to know how, much compression will cause intelligibility to dimmish 
significantly. 

This third experiment, then, was intended to answer these questions. 

For this experiment we prepared tapes of the Fairbanks Rhyme Test 
(in the same manner as previously but with the newer hybrid computer) 
at a compression ratio of 3:1, Figure 5, 6 shows the imaginary segment 
numbers available when these high compressions are recorded didacti- 
cally, It is seen there that all of the information is not restored by 
dichotomizing. By definition, a ratio of 3:1 diotic contains one-third 
of the information in the original signal; dichotomizing restores another 
third, Dichotic compression at 3:1, then, contains two-thirds of the 
information of the uncompressed signal. 

Experiment III presented three different modes of compression to the 
listeners. All the modes were at 3:1 with a discard interval of 40 msec. 
The three modes were: dichotic, diotic, and time- restored. Reference 
again to Figure 5, 6 reveals that in order to restore the time but not the 
information it is necessary to repeat the same segments used in the 
diotic mode. If the compression ratio is 3:1, each segment is repeated 
three times and only one-third of the segments are used. 

The decision to restore the original time frame by repeating the diotic 
(single file) compressed speech as in Figure 5. 6 perhaps was not a 
wise one. The results may have been more intelligible for the time- 
restored words had the restoration been done didactically. Preliminary 
data from a current experiment suggest that 3:1 dichotically restored 
words presented dichotically will prove to be more intelligible than 
3:1 dichotically compressed words. We initially believed that diotically 
time- restored isolated words would be more intelligible than diotically 
time -compressed words because of earlier experiments in restoring 
the time of continuous speech. We now feel that repeating already dis- 
torted sampling intervals in order to time- restore isolated words tends 
to increase listener confusion, whereas this distortion tends to be less 
effective in continuous speech. 

Figure 5, 7 shows the results of this investigation compared with those 
of Experiment I. Most of our hypotheses have been verified by Experi- 
ment III. Again, dichotic processing made a significant (< 0,01) 



62 



t! 



ORIGINAL 



! 



57 



u 



- o 

ro — 

Q 



o 

fcr 
O 
— x 

Q 



O 

LlI 

X 

_ p 

CO 

ro lu 
ct 



CO 




CD 




o 

CD 


_Q 

O 

in 




M- 










Xi 


X) 














ro 








in 




ro 


1 

\ 


o 


o 














ro 


DJ 






■ 








J2 

o 








M- 




o 










M- 










O 


o 














O 


cn 














-Q 


JQ 






ro 








h- 


CO 




— 










o 


o 
















CD 














«o 


X5 

in 






(XI 








o 


o 
















JO 














-Q 


















CM 














O 














— 







1 o 
08? 



CM 

1 



O 

ERIC 



ea 



> i 

V*'' O 



Fig u r e 5 , 6 , 3 : 1 c omp r e s s ion 



58 




2*1 3*1 
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improvement over the intelligibility possible diotically; this improve- 
ment was over 5%. Of even greater interest was the lack of improve- 
ment resulting from time restoration. To restore the normal time by 
repesiting segments results in a very peculiar sounding signal; so pecu- 
liar, in fact, that its intelligibility is significantly poorer than even 
di otic (<0,05) and much, much poorer than dichotic (< 0.001). So, 
time restoration is not the answer; at least, not time restored in this 
way because it introduces another kind of distortion. 

We have not really resolved the issue of whether the loss of intelligi- 
bility is due to the lack of information or to the press of time. It is 
true that the dichotic signal even at 3:1 is really quite intelligible, which 
continues to suggest that the problem rests in the information and not 
the speed. The fact that 'the time- restored signal was so poor lends 
some credence to this hypothesis, but the restored speech has peculiar- 
ities of its own. We have resolved, however, that listeners can process 
speech at a very high rate even when there is a lack of information. The 
next study to be done may be the one which resolves this issue, A 4:1 
dichotic signal contains as much information as a 2:1 diotic signal. If 
the losses are due solely to losses of information, then these should 
have equal intelligibility. If not, then 4:1 may be "too fast, 11 Mean- 
while, we find that we are well within human auditory processing time 
capabilities even at triple normal speed. The premise that loss of 
intelligibility in time-compressed speech is due primarily to the inability 
of the listener to process the speech at the higher rate is certainly an 
attractive one. For if it were true, it would suggest the possibility of 
training subjects in high-speed listening. 



CHAPTER VI 

A COMPARISON OF "DICHOTXG" SPEECH AND SPEECH 
COMPRESSED BY THE ELECTROMECHANICAL 
SAMPLING METHOD* 

Emerson Foulke and E, McLean Wirth** 



Recorded speech may be compressed in time by reproducing a succes- 
sion of periodic, time-abutted samples of the original recording. If 
the durations of the samples eliminated from such a reproduction are 
brief enough so that no critical feature of a speech signal can, by acci- 
dent of sampling, fall entirely within a discarded sample, the result 
is time-compressed, intelligible speech that is not altered with respect 
to vocal pitch or quality. 

Such sampling may be accomplished manually (Garvey, 1933b), by 
cutting a recorded tape into segments, discarding some of the segments, 
and splicing the remaining segments together again. It may be 
accomplished more conveniently by a tape reproducer of the type 
described by Fairbanks, Everitt, and Jaeger (1954), Devices of the 
Fairbanks type reproduce periodic, time-abutted samples of a recorded 



*The research described in this report was also reported by the 
junior author in her senior thesis, submitted to the Webster College, 

St. Louis, Missouri, 1968, This report also appears as Chapter HI 
in The Comprehension of Rapid Speech by the Blind: Part HI, Final 
Progress Report, March 1, 1964 - June 30, 1968, Project No, 2430, 
Grant No, QE-4-10-127, U, 5, Department of Health, Education, and 
Welfare, Office of Education, Bureau of Research , Non- Visual Per- 
ceptual Systems Laboratory, Graduate School, University of Louisville, 
Louisville, Kentucky, 1969, 

**Dr. Emerson Foulke is Director of the Perceptual Alternatives 
Laboratory and E, McLean Wirth is a former research assistant at 
the Center for Rate-Controlled Recordings, University of Louisville, 
Louisville, Kentucky 40208. 
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tape and, as before, the result is time-compressed, intelligible speech, 
without distortion in vocal pitch or quality (Foulke, 1969), 

A computer may also be used for the time compression of speech 
(Cramer, 1968; Scott, 1965), In this approach, the recorded speech 
signal is temporally segmented, some of the time segments are dis- 
carded according to a sampling rule for which the computer has been 
programmed, and the remaining segments, abutted in time, are re- 
produced as time- compressed speech. 

In a scheme proposed by Scott (1967a), the signal resulting from the 
process just described, is applied to one earphone of a headset. The 
samples that would have been discarded in the kind of compressed 
speech described heretofore, are retained, abutted in time, and sup- 
plied to the other earphone. With this approach, for compressions in 
time of 50% or less, all of the recorded signal is preserved in the 
compressed reproduction. It is only rearranged temporally. For 
compressions greater than 50%, some of the signal must be discarded, 
but much more is preserved than when only one succession of samples 
is reproduced, Scott calls the product of this process "dichotic 
speech. " 

When speech is compressed by an electromechanical compressor of the 
Fairbanks or Springer type, a single file of time-abutted samples is 
reproduced and this method will be referred to hereafter as the single 
file sampling method. When a computer is used to produce dichotic 
speech, two parallel files of time-abutted samples are reproduced, 
and this method will be referred to hereafter as the double file sam- 
pling method. 

When speech is compressed in time by discarding samples of the orig- 
inal signal, as the length of samples is reduced, the probability is 
reduced that a critical feature of a speech signal will fall entirely 
within a discarded sample (Garvey, 1953b), In designing a speech 
compressor, the physical parg.meters of the system must be adjusted 
to produce discard samples, the durations of which are short enough 
so that the probability of discarding a critical feature of a speech 
signal can safely be ignored. Two types of speech compressors have 
been developed for commercial distribution. One is based directly 
upon the Fairbanks scheme, * The other, based directly upon the 



^The speech compressor now manufactured by Mr, Wayne Graham, 
Discerned Sound, 4459 Kr?,ft Avenue, North Hollywood, California 
91602, is based upon the Fairbanks design. 



